A robust method to count, locate and separate audio sources in a multichannel underdetermined mixture
نویسندگان
چکیده
We propose a method to count and estimate the mixing directions and the sources in an underdetermined multichannel mixture. Like DUET-type methods, the approach is based on the hypothesis that the sources have timefrequency representations with limited overlap. However, instead of assuming essentially disjoint representations, we only assume that, in the neighbourhood of some time-frequency points, only one source contributes to the mixture: such time-frequency points can provide robust local estimates of the corresponding source direction. At the core of our contribution is a local confidence measure –inspired by the work of Deville on TIFROM– which detect the time-frequency regions where such a robust information is available. A clustering algorithm called DEMIX is proposed to merge the information from all time-frequency regions according to their confidence level. Two variants are proposed to treat instantaneous and anechoic mixtures. In the latter case, to overcome the intrinsic ambiguities of phase unwrapping as met with DUET, we propose a technique similar to GCC-PHAT to estimate time-delay parameters from phase differences between time-frequency representations of different channels. The resulting method is shown to be robust in conditions where all DUET-like comparable methods fail: a) when time-delays largely exceed one sample; b) when the source directions are very close. As an example, experiments show that, in more than 65% of the tested stereophonic mixtures of six speech sources, DEMIX-Anechoic correctly estimates the number of sources and outperforms DUET in the accuracy, providing a distance error 10 times lower. Key-words: blind source separation, multichannel audio, direction of arrival, delay estimation, sparse component analysis This work has been submitted to the IEEE Transactions on Signal Processing for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. ∗ CNRS † INRIA in ria -0 03 05 43 5, v er si on 2 4 Au g 20 08 Une méthode robuste pour compter, localiser et séparer les sources audio d’un mélange multicanal sous-déterminé Résumé : Nous proposons une méthode pour compter et estimer les directions et les sources d’un mélange multicanal sous-déterminé. De la même façon que pour les méthodes de type DUET, l’approche est basée sur l’hypothèse que les sources ont des représentations temps-fréquence qui se chevauchent peu. Cependant, plutôt que de supposer que les représentations aient des supports disjoints, nous supposons seulement que, dans le voisinage de quelques points temps-fréquence, seulement une source contribue au mélange: de tels points temps-fréquence peuvent fournir des estimations locales robustes des directions des sources correspondantes. Une de nos contributions majeures est une mesure de confiance locale –inspirée des travaux de Deville sur TIFROM– qui détecte les régions temps-fréquence où de telles informations sont disponibles. Nous proposons un algorithme de clustering appelé DEMIX qui permet de traiter l’information de toutes les régions temps-fréquence suivant leur niveau de confiance. Deux variantes de l’algorithme sont proposées afin de traiter le cas instantané et le cas anéchöıque. Dans ce dernier cas, afin de résoudre le problème intrinsèque de repliement de phase rencontré dans DUET, nous proposons une technique proche de GCC-PHAT pour estimer les paramètres de délai à partir des différences de phase entre les représentations temps-fréquence des différents canaux. La méthode résultante se montre robuste dans les situations où les méthodes de type DUET échouent: a) quand les délais sont très supérieurs à un échantillon; b) quand les directions des sources sont proches. Les expériences montrent que pour un mélange stéréophonique constitué de six sources, DEMIXAnechoic estime correctement le nombre de sources dans plus de 65% des cas, et obtient des estimations des directions avec une erreur moyenne plus de 10 fois inférieure à celle de DUET. Mots-clés : separation de source aveugle, audio multicanal, direction d’arrivée, estimation de délais, analyse en composantes parcimonieuses in ria -0 03 05 43 5, v er si on 2 4 Au g 20 08 A robust method to count, locate and separate audio sources in a multichannel...3
منابع مشابه
Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation Factorisation en matrices à coefficients positifs de données multicanal convolutives pour la séparation de sources audio
We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the Short-Time Fourier Transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegativ...
متن کاملحذف کلاتر قوی دریا با استفاده از الگوریتم DUET BSS
Abstract- Suppressing clutter is one of the most crucial phases in radar signal processing. Also, Blind Source Separation (BSS) is one of the recent and very important problems in signal processing that shows its efficiency in many applications. Degenerate Unmixin Estimation Technique (DUET) is one of the Underdetermined BSS algorithms that separate sources from mixtures using only two mixtures...
متن کاملMultichannel audio signal source separation based on an Interchannel Loudness Vector Sum
In this paper, a Blind Source Separation (BSS) algorithm for multichannel audio contents is proposed. Unlike common BSS algorithms targeting stereo audio contents or microphone array signals, our technique is targeted at multichannel audio such as 5.1 and 7.1ch audio. Since most multichannel audio object sources are panned using the Inter-channel Loudness Difference (ILD), we employ the ILVS (I...
متن کاملA Robust Method to Count and Locate Audio Sources in a Stereophonic Linear Instantaneous Mixture
We propose a robust method to estimate the number of audio sources and the mixing matrix in a linear instantaneous mixture, even with more sources than sensors. Our method is based on a multiscale Short Time Fourier Transform (STFT), and relies on the assumption that in the neighborhood of some (unknown) scales and time-frequency points, only one source contributes to the mixture. Such time-fre...
متن کاملPerceptually controlled doping for audio source separation
The separation of an underdetermined audio mixture can be performed through sparse component analysis (SCA) that relies however on the strong hypothesis that source signals are sparse in some domain. To overcome this difficulty in the case where the original sources are available before the mixing process, the informed source separation (ISS) embeds in the mixture a watermark, which information...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008